Description: The data consist of 200 subjects from a larger study on the survival of patients following admission to an adult intensive care unit (ICU). The study used logistic regression to predict the probability of survival for these patients until their discharge from the hospital. The dependent variable is the binary variable Vital Status (STA). Nineteen possible predictor variables, both discrete and continuous, were also observed. Number of cases: 200 Variable Names:
> # ICU <- read.table("C:/Users/ekene/OneDrive - McMaster University/Avenue2Learn_Winter2020/EH 705/eh705termproject/ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> ICU <- read.table("./ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> str(ICU)
'data.frame': 200 obs. of 21 variables:
$ ID : int 8 12 14 28 32 38 40 41 42 50 ...
$ Status : int 0 0 0 0 0 0 0 0 0 0 ...
$ Age : int 27 59 77 54 87 69 63 30 35 70 ...
$ Sex : int 1 0 0 0 1 0 0 1 0 1 ...
$ Race : int 1 1 1 1 1 1 1 1 2 1 ...
$ Service : int 0 0 1 0 1 0 1 0 0 1 ...
$ Cancer : int 0 0 0 0 0 0 0 0 0 1 ...
$ Renal : int 0 0 0 0 0 0 0 0 0 0 ...
$ Infection : int 1 0 0 1 1 1 0 0 0 0 ...
$ CPR : int 0 0 0 0 0 0 0 0 0 0 ...
$ Systolic : int 142 112 100 142 110 110 104 144 108 138 ...
$ HeartRate : int 88 80 70 103 154 132 66 110 60 103 ...
$ Previous : int 0 1 0 0 1 0 0 0 0 0 ...
$ Type : int 1 1 0 1 1 1 0 1 1 0 ...
$ Fracture : int 0 0 0 1 0 0 0 0 0 0 ...
$ PO2 : int 0 0 0 0 0 1 0 0 0 0 ...
$ PH : int 0 0 0 0 0 0 0 0 0 0 ...
$ PCO2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Bicarbonate : int 0 0 0 0 0 1 0 0 0 0 ...
$ Creatinine : int 0 0 0 0 0 0 0 0 0 0 ...
$ Consciousness: int 1 1 1 1 1 1 1 1 1 1 ...
> summary(ICU)
ID Status Age Sex Race
Min. : 4.0 Min. :0.0 Min. :16.00 Min. :0.00 Min. :1.000
1st Qu.:210.2 1st Qu.:0.0 1st Qu.:46.75 1st Qu.:0.00 1st Qu.:1.000
Median :412.5 Median :0.0 Median :63.00 Median :0.00 Median :1.000
Mean :444.8 Mean :0.2 Mean :57.55 Mean :0.38 Mean :1.175
3rd Qu.:671.8 3rd Qu.:0.0 3rd Qu.:72.00 3rd Qu.:1.00 3rd Qu.:1.000
Max. :929.0 Max. :1.0 Max. :92.00 Max. :1.00 Max. :3.000
Service Cancer Renal Infection CPR
Min. :0.000 Min. :0.0 Min. :0.000 Min. :0.00 Min. :0.000
1st Qu.:0.000 1st Qu.:0.0 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
Median :1.000 Median :0.0 Median :0.000 Median :0.00 Median :0.000
Mean :0.535 Mean :0.1 Mean :0.095 Mean :0.42 Mean :0.065
3rd Qu.:1.000 3rd Qu.:0.0 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:0.000
Max. :1.000 Max. :1.0 Max. :1.000 Max. :1.00 Max. :1.000
Systolic HeartRate Previous Type
Min. : 36.0 Min. : 39.00 Min. :0.00 Min. :0.000
1st Qu.:110.0 1st Qu.: 80.00 1st Qu.:0.00 1st Qu.:0.000
Median :130.0 Median : 96.00 Median :0.00 Median :1.000
Mean :132.3 Mean : 98.92 Mean :0.15 Mean :0.735
3rd Qu.:150.0 3rd Qu.:118.25 3rd Qu.:0.00 3rd Qu.:1.000
Max. :256.0 Max. :192.00 Max. :1.00 Max. :1.000
Fracture PO2 PH PCO2 Bicarbonate
Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.0 Min. :0.000
1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.0 1st Qu.:0.000
Median :0.000 Median :0.00 Median :0.000 Median :0.0 Median :0.000
Mean :0.075 Mean :0.08 Mean :0.065 Mean :0.1 Mean :0.075
3rd Qu.:0.000 3rd Qu.:0.00 3rd Qu.:0.000 3rd Qu.:0.0 3rd Qu.:0.000
Max. :1.000 Max. :1.00 Max. :1.000 Max. :1.0 Max. :1.000
Creatinine Consciousness
Min. :0.00 Min. :1.000
1st Qu.:0.00 1st Qu.:1.000
Median :0.00 Median :1.000
Mean :0.05 Mean :1.125
3rd Qu.:0.00 3rd Qu.:1.000
Max. :1.00 Max. :3.000
From the ICU Admissions dataset, I made the following observations; 1. Most of the variables are integer but from information about the data, most of the variables can be recoded to factor variables.
2. There are only four variables that can be left as numerical variables others can be recoded to categorical/factor variables
2. There would be a need to recode the categorical variables to factors.
3. The dependent variable is the binary variable Vital Status (Status).
4. Nineteen possible predictor variables, both discrete and continuous, were also observed.
5. There are no missing data
Labelling the factor levels helps with comparative analysis and visualization
> ICU <- within(ICU, {
+ Status <- factor(Status, labels=c('Lived','Died'))
+ Sex <- factor(Sex, labels=c('Male','Female'))
+ Race <- factor(Race, labels=c('White','Black','Other'))
+ Service <- factor(Service, labels=c('Medical','Surgical'))
+ Cancer <- factor(Cancer, labels=c('No','Yes'))
+ Renal <- factor(Renal, labels=c('No','Yes'))
+ Infection <- factor(Infection, labels=c('No','Yes'))
+ CPR <- factor(CPR, labels=c('No','Yes'))
+ Previous <- factor(Previous, labels=c('No','Yes'))
+ Type <- factor(Type, labels=c('Elective','Emergency'))
+ Fracture <- factor(Fracture, labels=c('No','Yes'))
+ PCO2 <- factor(PCO2, labels=c('No','Yes'))
+ PH <- factor(PH, labels=c('No','Yes'))
+ PO2 <- factor(PO2, labels=c('No','Yes'))
+ Bicarbonate <- factor(Bicarbonate, labels=c('No','Yes'))
+ Creatinine <- factor(Creatinine, labels=c('No','Yes'))
+ Consciousness <- factor(Consciousness, labels=c('Conscious','Deep Stupor','Coma'))
+ })
> headTail(ICU) %>% datatable(rownames = TRUE, filter="top", options = list(pageLenght = 10, scrollX=T))%>% formatRound(columns=c(1:17), digits=0)
> # write.csv(ICU, file="ICUAdmissions_recoded.csv", row.names=FALSE)
> summary(ICU)
ID Status Age Sex Race
Min. : 4.0 Lived:160 Min. :16.00 Male :124 White:175
1st Qu.:210.2 Died : 40 1st Qu.:46.75 Female: 76 Black: 15
Median :412.5 Median :63.00 Other: 10
Mean :444.8 Mean :57.55
3rd Qu.:671.8 3rd Qu.:72.00
Max. :929.0 Max. :92.00
Service Cancer Renal Infection CPR Systolic
Medical : 93 No :180 No :181 No :116 No :187 Min. : 36.0
Surgical:107 Yes: 20 Yes: 19 Yes: 84 Yes: 13 1st Qu.:110.0
Median :130.0
Mean :132.3
3rd Qu.:150.0
Max. :256.0
HeartRate Previous Type Fracture PO2 PH
Min. : 39.00 No :170 Elective : 53 No :185 No :184 No :187
1st Qu.: 80.00 Yes: 30 Emergency:147 Yes: 15 Yes: 16 Yes: 13
Median : 96.00
Mean : 98.92
3rd Qu.:118.25
Max. :192.00
PCO2 Bicarbonate Creatinine Consciousness
No :180 No :185 No :190 Conscious :185
Yes: 20 Yes: 15 Yes: 10 Deep Stupor: 5
Coma : 10
> str(ICU)
'data.frame': 200 obs. of 21 variables:
$ ID : int 8 12 14 28 32 38 40 41 42 50 ...
$ Status : Factor w/ 2 levels "Lived","Died": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 27 59 77 54 87 69 63 30 35 70 ...
$ Sex : Factor w/ 2 levels "Male","Female": 2 1 1 1 2 1 1 2 1 2 ...
$ Race : Factor w/ 3 levels "White","Black",..: 1 1 1 1 1 1 1 1 2 1 ...
$ Service : Factor w/ 2 levels "Medical","Surgical": 1 1 2 1 2 1 2 1 1 2 ...
$ Cancer : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
$ Renal : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Infection : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 1 1 1 1 ...
$ CPR : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Systolic : int 142 112 100 142 110 110 104 144 108 138 ...
$ HeartRate : int 88 80 70 103 154 132 66 110 60 103 ...
$ Previous : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 1 1 1 1 ...
$ Type : Factor w/ 2 levels "Elective","Emergency": 2 2 1 2 2 2 1 2 2 1 ...
$ Fracture : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
$ PO2 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
$ PH : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ PCO2 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Bicarbonate : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
$ Creatinine : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Consciousness: Factor w/ 3 levels "Conscious","Deep Stupor",..: 1 1 1 1 1 1 1 1 1 1 ...
> numSummary(ICU[,c("Age", "HeartRate", "Systolic"), drop=FALSE], statistics=c("mean", "sd", "IQR",
+ "quantiles"), quantiles=c(0,.25,.5,.75,1))
mean sd IQR 0% 25% 50% 75% 100% n
Age 57.545 20.05465 25.25 16 46.75 63 72.00 92 200
HeartRate 98.925 26.82962 38.25 39 80.00 96 118.25 192 200
Systolic 132.280 32.95210 40.00 36 110.00 130 150.00 256 200
> p01<-ggplot(ICU, aes(x=Sex )) +
+ geom_bar( fill="pink" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-5, label="Base=200", size=4, color="black" )
>
> p02<-ggplot(ICU, aes(x=Race )) +
+ geom_bar( fill="lightblue" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p03<-ggplot(ICU, aes(x=Service )) +
+ geom_bar( fill="blue" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p04 <- ggplot(ICU, aes(x=Cancer )) +
+ geom_bar( fill="Green" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p05 <- ggplot(ICU, aes(x=Renal )) +
+ geom_bar( fill="Orange" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p06 <- ggplot(ICU, aes(x=Infection )) +
+ geom_bar( fill="Red" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p07 <- ggplot(ICU, aes(x=CPR )) +
+ geom_bar( fill="Yellow" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p08 <- ggplot(ICU, aes(x=Previous )) +
+ geom_bar( fill="Purple" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> library(Rmisc)
> multiplot(p01, p02, p03, p04, p05, p06, p07, p08, layout=matrix(c(1:8), nrow=4, byrow=TRUE))
> p09<-ggplot(ICU, aes(x=Type )) +
+ geom_bar( fill="pink" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-5, label="Base=200", size=4, color="black" )
>
> p10<-ggplot(ICU, aes(x=Fracture )) +
+ geom_bar( fill="lightblue" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p11<-ggplot(ICU, aes(x=PO2 )) +
+ geom_bar( fill="blue" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p12 <- ggplot(ICU, aes(x=PH )) +
+ geom_bar( fill="Green" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p13 <- ggplot(ICU, aes(x=PCO2 )) +
+ geom_bar( fill="Orange" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p14 <- ggplot(ICU, aes(x=Bicarbonate )) +
+ geom_bar( fill="Red" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p15 <- ggplot(ICU, aes(x=Creatinine )) +
+ geom_bar( fill="Yellow" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> p16 <- ggplot(ICU, aes(x=Consciousness )) +
+ geom_bar( fill="Purple" ) +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=.8, y=-1, label="Base=200", size=4)
>
> library(Rmisc)
> multiplot(p09, p10, p11, p12, p13, p14, p15, p16, layout=matrix(c(1:8), nrow=4, byrow=TRUE))
> library(ggplot2)
> f01<-ggplot(ICU, aes(x=Sex, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Sex")
>
> f02<-ggplot(ICU, aes(x=Race, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Race")
> f03<-ggplot(ICU, aes(x=Service, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Service")
>
> f04<-ggplot(ICU, aes(x=Cancer, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Cancer")
>
> library(Rmisc)
> multiplot(f01, f02, f03, f04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f05<-ggplot(ICU, aes(x=Renal, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Renal")
>
> f06<-ggplot(ICU, aes(x=Infection, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Infection")
>
> f07<-ggplot(ICU, aes(x=CPR, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by CPR")
>
> f08<-ggplot(ICU, aes(x=Previous, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Previous")
>
> library(Rmisc)
> multiplot(f05, f06, f07, f08, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f09<-ggplot(ICU, aes(x=Type, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Type")
>
> f10<-ggplot(ICU, aes(x=Fracture, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Fracture")
>
> f11<-ggplot(ICU, aes(x=PO2, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PO2")
>
> f12<-ggplot(ICU, aes(x=PH, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PH")
>
> library(Rmisc)
> multiplot(f09, f10, f11, f12, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f13<-ggplot(ICU, aes(x=PCO2, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PCO2")
>
> f14<-ggplot(ICU, aes(x=Bicarbonate, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Bicarbonate")
>
> f15<-ggplot(ICU, aes(x=Creatinine, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Creatinine")
>
> f16<-ggplot(ICU, aes(x=Consciousness, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Consciousness")
>
>
> library(Rmisc)
> multiplot(f13, f14, f15, f16, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> d01 <- ggplot(ICU, aes(x=Age)) +
+ geom_density(fill="green") +
+ ggtitle("Age") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d02 <- ggplot(ICU, aes(x=Systolic)) +
+ geom_density(fill="green") +
+ ggtitle("Systolic") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d03 <- ggplot(ICU, aes(x=HeartRate)) +
+ geom_density(fill="green") +
+ ggtitle("HeartRate") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d04 <- ggplot(ICU, aes(x=ID)) +
+ geom_density(fill="green") +
+ ggtitle("ID") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> multiplot(d01, d02, d03, d04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> n01<-ggplot(ICU, aes(x=Age, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status by Age")
>
> n02<-ggplot(ICU, aes(x=Systolic, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status Systolic")
>
> n03<-ggplot(ICU, aes(x=HeartRate, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status HeartRate")
>
> n04<-ggplot(ICU, aes(x=Age, fill = Status)) +
+ theme_bw() +
+ facet_wrap(~ Sex) +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status in male and female patients by Age")
>
> multiplot(n01, n02, n03, n04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
================================================================================
Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99
1 Systolic 55.92 92 110 130 150 170 212.12
2 Age 16.99 21 46.75 63 72 78 91
3 HeartRate 45.98 65 80 96 118.25 136.1 162.08
4 ID 11.96 81.3 210.25 412.5 671.75 829.8 924.01
> with(ICU, qqPlot(Systolic, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Systolic"))
[1] 200 179
> normalityTest(~Systolic, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Systolic
W = 0.98369, p-value = 0.0204
> with(ICU, qqPlot(HeartRate, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Heartrate"))
[1] 125 48
> normalityTest(~HeartRate, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: HeartRate
W = 0.98598, p-value = 0.04478
> with(ICU, qqPlot(Age, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Age"))
[1] 23 97
> normalityTest(~Age, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Age
W = 0.92836, p-value = 2.507e-08
> numSummary(ICU[,c("Age", "Systolic", "HeartRate"), drop=FALSE], groups=ICU$Status, statistics=c("mean", "sd", "se(mean)"),quantiles=c(0,.25, .5, .75,1))
Variable: Age
mean sd se(mean) n
Lived 55.650 20.42818 1.614990 160
Died 65.125 16.64900 2.632438 40
Variable: Systolic
mean sd se(mean) n
Lived 135.6438 29.80151 2.356016 160
Died 118.8250 41.08084 6.495451 40
Variable: HeartRate
mean sd se(mean) n
Lived 98.500 26.97868 2.132852 160
Died 100.625 26.49304 4.188918 40
> library(psych)
> describeBy(ICU, ICU$Status)
Descriptive statistics by group
group: Lived
vars n mean sd median trimmed mad min max range skew
ID 1 160 457.04 276.35 438.5 454.25 346.19 8 929 921 0.07
Status* 2 160 1.00 0.00 1.0 1.00 0.00 1 1 0 NaN
Age 3 160 55.65 20.43 61.0 56.86 19.27 16 91 75 -0.55
Sex* 4 160 1.38 0.49 1.0 1.34 0.00 1 2 1 0.51
Race* 5 160 1.19 0.50 1.0 1.05 0.00 1 3 2 2.65
Service* 6 160 1.58 0.49 2.0 1.60 0.00 1 2 1 -0.33
Cancer* 7 160 1.10 0.30 1.0 1.00 0.00 1 2 1 2.64
Renal* 8 160 1.07 0.25 1.0 1.00 0.00 1 2 1 3.38
Infection* 9 160 1.38 0.49 1.0 1.34 0.00 1 2 1 0.51
CPR* 10 160 1.04 0.19 1.0 1.00 0.00 1 2 1 4.82
Systolic 11 160 135.64 29.80 132.0 133.97 29.65 48 224 176 0.42
HeartRate 12 160 98.50 26.98 95.0 97.48 25.20 39 192 153 0.44
Previous* 13 160 1.14 0.35 1.0 1.05 0.00 1 2 1 2.01
Type* 14 160 1.68 0.47 2.0 1.73 0.00 1 2 1 -0.77
Fracture* 15 160 1.07 0.26 1.0 1.00 0.00 1 2 1 3.20
PO2* 16 160 1.07 0.25 1.0 1.00 0.00 1 2 1 3.38
PH* 17 160 1.06 0.23 1.0 1.00 0.00 1 2 1 3.82
PCO2* 18 160 1.10 0.30 1.0 1.00 0.00 1 2 1 2.64
Bicarbonate* 19 160 1.06 0.24 1.0 1.00 0.00 1 2 1 3.58
Creatinine* 20 160 1.03 0.17 1.0 1.00 0.00 1 2 1 5.34
Consciousness* 21 160 1.02 0.22 1.0 1.00 0.00 1 3 2 8.69
kurtosis se
ID -1.25 21.85
Status* NaN 0.00
Age -0.82 1.61
Sex* -1.75 0.04
Race* 5.98 0.04
Service* -1.91 0.04
Cancer* 5.01 0.02
Renal* 9.46 0.02
Infection* -1.75 0.04
CPR* 21.40 0.02
Systolic 0.34 2.36
HeartRate 0.17 2.13
Previous* 2.06 0.03
Type* -1.41 0.04
Fracture* 8.27 0.02
PO2* 9.46 0.02
PH* 12.64 0.02
PCO2* 5.01 0.02
Bicarbonate* 10.89 0.02
Creatinine* 26.66 0.01
Consciousness* 74.04 0.02
------------------------------------------------------------
group: Died
vars n mean sd median trimmed mad min max range skew
ID 1 40 395.95 250.74 363 386.72 243.89 4 921 917 0.37
Status* 2 40 2.00 0.00 2 2.00 0.00 2 2 0 NaN
Age 3 40 65.12 16.65 68 66.47 11.86 19 92 73 -0.84
Sex* 4 40 1.40 0.50 1 1.38 0.00 1 2 1 0.39
Race* 5 40 1.12 0.46 1 1.00 0.00 1 3 2 3.46
Service* 6 40 1.35 0.48 1 1.31 0.00 1 2 1 0.61
Cancer* 7 40 1.10 0.30 1 1.00 0.00 1 2 1 2.57
Renal* 8 40 1.20 0.41 1 1.12 0.00 1 2 1 1.44
Infection* 9 40 1.60 0.50 2 1.62 0.00 1 2 1 -0.39
CPR* 10 40 1.18 0.38 1 1.09 0.00 1 2 1 1.65
Systolic 11 40 118.83 41.08 126 117.22 32.62 36 256 220 0.60
HeartRate 12 40 100.62 26.49 96 99.78 25.20 55 160 105 0.28
Previous* 13 40 1.18 0.38 1 1.09 0.00 1 2 1 1.65
Type* 14 40 1.95 0.22 2 2.00 0.00 1 2 1 -3.98
Fracture* 15 40 1.07 0.27 1 1.00 0.00 1 2 1 3.11
PO2* 16 40 1.12 0.33 1 1.03 0.00 1 2 1 2.18
PH* 17 40 1.10 0.30 1 1.00 0.00 1 2 1 2.57
PCO2* 18 40 1.10 0.30 1 1.00 0.00 1 2 1 2.57
Bicarbonate* 19 40 1.12 0.33 1 1.03 0.00 1 2 1 2.18
Creatinine* 20 40 1.12 0.33 1 1.03 0.00 1 2 1 2.18
Consciousness* 21 40 1.52 0.82 1 1.41 0.00 1 3 2 1.03
kurtosis se
ID -1.00 39.65
Status* NaN 0.00
Age 0.73 2.63
Sex* -1.89 0.08
Race* 10.72 0.07
Service* -1.67 0.08
Cancer* 4.71 0.05
Renal* 0.09 0.06
Infection* -1.89 0.08
CPR* 0.73 0.06
Systolic 1.40 6.50
HeartRate -0.75 4.19
Previous* 0.73 0.06
Type* 14.16 0.03
Fracture* 7.85 0.04
PO2* 2.84 0.05
PH* 4.71 0.05
PCO2* 4.71 0.05
Bicarbonate* 2.84 0.05
Creatinine* 2.84 0.05
Consciousness* -0.74 0.13
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for Age.
> with(ICU, tapply(Age, Status, var, na.rm=TRUE))
Lived Died
417.3107 277.1891
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Age ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 3.127 0.07855 .
198
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p value = 0.07855 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(Age~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: Age by Status
t = -2.7151, df = 198, p-value = 0.007211
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-16.35688 -2.59312
sample estimates:
mean in group Lived mean in group Died
55.650 65.125
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.007211, 0 is not within the confidence intervals of -16.35688 to -2.59312 and t = -2.7151 is less than -1.967548, we reject the null hypothesis and conclude that the means for Age for the groups Lived and Died are not the same.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for HeartRate.
> with(ICU, tapply(HeartRate, Status, var, na.rm=TRUE))
Lived Died
727.8491 701.8814
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(HeartRate ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 0.008 0.929
198
Since the p-value = 0.929 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(HeartRate~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: HeartRate by Status
t = -0.44714, df = 198, p-value = 0.6553
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.496845 7.246845
sample estimates:
mean in group Lived mean in group Died
98.500 100.625
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.6553, 0 is within the confidence intervals of -11.496845 to 7.246845 and t = -0.44714 is greater than -1.967548, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for HeartRate are the same among those that lived and those that died.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for Systolic.
> with(ICU, tapply(Systolic, Status, var, na.rm=TRUE))
Lived Died
888.1301 1687.6353
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Systolic ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 4.1872 0.04205 *
198
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value = 0.04205 for testing the homogeneity of variances is less than 0.05, we reject the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are not equal. As such, the Welch two Sample t-test is used to analyze whether there was a significant difference in means.
> t.test(Systolic~Status, alternative="two.sided", conf.level=.95, var.equal=FALSE, data=ICU)
Welch Two Sample t-test
data: Systolic by Status
t = 2.4341, df = 49.726, p-value = 0.01856
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.938642 30.698858
sample estimates:
mean in group Lived mean in group Died
135.6438 118.8250
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.01856, 0 is not within the confidence intervals of 2.938642 to 30.698858 and t = 2.4341 is greater than 1.9675, we reject the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are not the same among those that lived and those that died.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal Ha: the means are different
The following is the comparison of variances between the two Sex groups for Age.
> with(ICU, tapply(Age, Sex, var, na.rm=TRUE))
Male Female
378.4780 436.5867
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Age ~ Sex, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 0.1154 0.7344
198
Since the p-value = 0.7344 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(Age~Sex, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: Age by Sex
t = -1.3582, df = 198, p-value = 0.1759
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.708824 1.789469
sample estimates:
mean in group Male mean in group Female
56.04032 60.00000
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548